(N 



CM 



c3 



O 



THE OPTIMAL SINK AND THE BEST SOURCE IN A 

MARKOV CHAIN 

YURI BAKHTIN AND LEONID BUNIMOVICH 



O 

Abstract. It is well known that the distributions of hitting times in 
(*"] \ Markov chains are quite irregular, unless the limit as time tends to 

C^ ■ infinity is considered. We show that nevertheless for a typical finite ir- 

reducible Markov chain and for nondegenerate initial distributions the 
tails of the distributions of the hitting times for the states of a Markov 
chain can be ordered, i.e., they do not overlap after a certain finite mo- 
ment of time. If one considers instead each state of a Markov chain 
as a source rather than a sink then again the states can generically be 
ordered according to their efficiency. The mechanisms underlying these 
two orderings are essentially different though. Our results can be used, 



C^ , e.g., for a choice of the initial distribution in numerical experiments with 



the fastest convergence to equilibrium/stationary distribution, for char- 
acterization of the elements of a dynamical network according to their 
ability to absorb and transmit the substance ("information") that is 
circulated over the network, for determining optimal stopping moments 
^o (stopping signals/words) when dealing with sequences of symbols, etc. 

> 

in 

o 

CS| ■ 1. Introduction 

Hitting and recurrence times are a classical subject in the theory of ran- 
dom processes. However, the relevant studies have always been concerned 



with averages (expectations) of hitting and recurrence times and relations 
between their distributions for a fixed state [12] . [7] . [TO] . [T5] . [T] . [9] . [8] . [T2] . [5] . 

It is well-known that the distributions of recurrence times are quite regular 
for many random processes and dynamical systems [10J,[9J,[8,[5J. On the 



other hand, the distribution functions of the first hitting times are very 
irregular [7J , [13] , [TJ , [S] , [H] , [5] , [33] . This seems to be natural because an 
ergodic process returns to any set of positive measure infinitely many times 
with probability 1, while the hitting event occurs only once. 

Therefore, our first result that for a typical irreducible Markov chain and 
for a typical initial distribution, the distribution tails of the first hitting 
times for the states of the chain can be ordered is quite surprising. This 
striking regularity property means that there is a finite moment of time no 
such that the tails of the survival probabilities Pi(n), n > no, i = 1, 2, . . . , N, 
form an ordered set, i.e., P ai {n) < P a2 (n) < . . . < P aN (n) for all n > n , 
where <7j 6 {1, 2, . . . , N} for all i. From this point of view, the state a\ is 
the most efficient sink (absorber of "information") out of all the states of 
the Markov chain. 
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The question of the choice of the best (worst) sink naturally arises in the 
theory of dynamical networks. A dynamical network is a dynamical sys- 
tem that is generated by individual dynamics of its elements (cells, power 
stations, neurons, etc.), the interactions between these elements and the 
structure of the graph of interactions (often called the topology of the net- 
work). These three characteristics determine the long term dynamics of a 
network [3j. 

Traditionally, the theory of dynamical systems deals with asymptotic in 
time (t — > oo) properties. It has been found though recently [4] that it 
is also possible to effectively answer some natural questions on finite time 
dynamics. For instance, placing a hole in a proper place in the phase space 
of chaotic dynamical systems guarantees that survival probabilities for this 
hole for all times n > uq are smaller than for other holes of the same size 
(measure). 

However, it was recently discovered [I], [2] that it is possible to make finite 
time predictions of dynamics even if there are no small/large parameters in 
equations which govern dynamics of a system. 

The results of the present paper (as the ones in [I], [2]) are not only gener- 
ally unexpected but often counterintuitive as well. For instance we provide 
the examples where the best sink or source is not the state with the maximal 
equilibrium/stationary probability. 

It is always tempting and important to try to characterize elements of net- 
works by their ability to absorb and transmit "information" . By combining 
the ideas and approaches of |1],[H] it was shown in [2] that, indeed, one can 
characterize the elements of networks by their ability to leak "information" 
out of the system. Thus the elements of networks could be characterized 
by their dynamical properties rather than by standard static characteristics 
like centrality, betweenness, etc., which are based only on the topology of a 
network, rather than on its dynamics. 

Typically, chaotic dynamical systems, even the most chaotic ones, have a 
fast decaying but still infinite memory. Therefore, the studies of statistical 
properties of dynamical systems always make use of results of the probability 
theory and, if needed, require to prove some modifications of the existing 
limit theorems, etc. It is a very natural approach because a memory in 
chaotic dynamical systems is (most often) infinite and such systems are 
approximated by random processes with a finite memory. However, even for 
such random processes standard approach is to analyze only their asymptotic 
in time properties. We show here, though, that some interesting finite time 
properties of random processes can also be rigorously studied. For instance, 
our results show that for hitting times one can find not only relations between 
their averages, but also between their distribution functions. It occurred 
that the infinite tails of these distribution never overlap after a finite moment 
of time that can be effectively computed. Our results also generalize those 
of [1],[2] to an essentially larger class of dynamical systems. 
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However, the question that we address in this paper seems to have never 
been considered even in the theory of Markov chains. Our results show that 
for hitting times one can find not only relations between their averages, but 
even between their distributions. 

Another problem considered in this paper is to find the most efficient 
source in a Markov chain. To the best of our knowledge, this problem has 
not been addressed before. It is also motivated by the dynamical networks 
where the following question is of utmost importance: which node (element 
of a network) one should apply a perturbation to, in order to achieve the 
strongest effect? We show that for a typical irreducible Markov chain, there 
also exists a hierarchy of its states with respect to the rate at which the 
initial perturbation converges to the stationary state. Thus one can find 
an optimal node to apply perturbation to in order to achieve the fastest 
relaxation. 

And again typically there exists a finite moment of time after which the 
states of a Markov chain form an ordered set with respect to their ability 
to transmit information to the entire chain (network) or to serve as sources. 
Generalizations to the case when a sink/source consists not of one but of sev- 
eral states of a Markov chain are straightforward. Another straightforward 
(although important for applications where one deals, e.g., with a network of 
chemical reactions, supply chains, etc) generalization deals with nonnegative 
(rather with transition probabilities) matrices and uses Perron-Frobenius 
theorem instead of the Markov theorem. The results of our paper could be 
used e.g. for choosing an appropriate (e.g. the fastest convergent) initial 
distribution in computer experiment, for choosing an appropriate sequence 
of stopping/observing times when dealing with the sequences of symbols and 
for dynamical characterization of the elements of networks. 

These finite time probabilistic predictions allowed to realize that some 
natural basic questions have never been addressed in the theory of stochastic 
processes and even for Markov chains. This gap should be filled in. 



2. Most efficient sink 

It is intuitively clear that for most Markov chains some of the states are 
more important for the dynamics than the others. The goal of this section is 
to introduce and study a measure of importance of the states based on the 
escape rate through a state (or a family of states, since this generalization 
of our approach is straightforward) . 

Let P = {Pij)fj=x be the transition probability matrix of an irreducible 
Markov chain (see, e.g., [6, Chapter XV]), on state space {1, ...,N} for 
some JVeN. 

Let us fix A; £ { 1 , . . . , N} and stop our Markov chain as soon as it reaches 
state k. In other words, whenever the original Markov chain makes a transi- 
tion to k, it gets killed, so that the state k can be considered as a cemetery 
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state for the Markov chain, or a hole through which the mass leaks out of 
the system. 

There are at least two equivalent ways one can describe the resulting 
dynamics with. One is to treat the new system as a new Markov chain with 
absorbing state k and introduce the associated transition matrix p( k > by 



P v 



(k) 



1, j = i = k, 
0, j^i = k. 



(*h 



Another way is to introduce a matrix Q( k > = (Q\J )i,j^k obtained from P 
by crossing out its fc-th row and column. The matrix PW is a stochastic 
matrix whereas Q^ k ' is strictly substochastic (or sub-Markov) since it does 
not account for the mass leaking out through the state k. 

We assume that even after the removal of an arbitrary state k the system 
remains irreducible and aperiodic, i.e., for some uq = no(k) and all n > no, 
all entries of the matrix (Q' >) n are positive. 

Remark 1. The aperiodicity assumption is standard, see, e.g., [6J Section 
XV. 9], and we make it to avoid unnecessary although straightforward tech- 
nicalities. 

Let us denote the simplex of all probability distributions on {1, . . . , N} 
by Aat. Suppose we are given the initial distribution p = (p±, . . . ,pn) G An- 
After n steps, the distribution of the Markov chain with a hole at state k is 
given by p(P( k ') n . The irreducibility of P implies that, as n — > oo, this dis- 
tribution converges to the one concentrated at k. This is the only stationary 
distribution, i.e., the only eigenvector corresponding to the leading eigen- 
value 1 of the stochastic matrix P", The rate of convergence to this obvious 
equilibrium is characterized by the second largest eigenvalue, fj,k < 1. It is 
easy to see that the spectrum of p( k > coincides with that of Q^ k > except 
for a simple eigenvalue 1. Therefore, Ufc is also the leading positive eigen- 
value of matrix Q^ k >. In our setting, the classical Perron-Frobenius (PF) 
theorem guarantees that u& is simple and there is an associated eigenvector 
q\ k ) = (<j£ )i-Lk with all positive coordinates. 

We can choose q^ k > so that besides the equality 

(i) 9 (fc) Q (fc) =/W fc) , 

it satisfies 



E 



(fe) 1 



thus defining a probability distribution. Notice that ([T|) is exactly the def- 
inition of a quasi-stationary distribution for the sub-Markov kernel Q^ k '. 
Since the matrix is sub-Markov, there is no stationary distribution, and the 
total mass of a vector qQ( ' may be less than 1 for a probability vector q. 
However, if we normalize the distribution to have total mass 1 after each 
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step then we end up with the notion of quasi-stationary distributions de- 
fined by (pQ). This equation means that under the stationary distribution, 
the total mass that has not leaked through k multiplies by //& < 1 at every 
step. Therefore, A& = — ln//& can serve as the escape rate through k. It 
can happen that fik = 0, in this case, all mass escapes the system in finitely 
many steps, and we set A& = oo. 

For q = (qi)ijtk, we define Mi '(q) to be the survival probability, or the 
total mass remaining in the sub-Markov chain defined by Q^ ' after n steps: 

If p = (pi, . . . ,pn), we define p^ to be an N — 1-dimensional vector (pi)i^k 
and denote 

M^{p) = M^\p^). 

Since every nonzero vector with nonnegative components has a nontrivial 
positive component in the direction of the PF eigenvector, the following 
statement holds true. 

Theorem 1. Let A& < oo for some k E {1, . . . , N}. Then for any p G Ajv 
with pk < 1, there are numbers ci(p),C2(p) depending only on p such that 

ci{p)e~ Xkn < MW(p) < c 2 {p)e- Xkn . 

The next corollary compares leaking through different holes. 

Corollary 1. (1) If Aj > Xj, then for any p,q £ Ajv with qj < 1, there 

is riQ = uq(p, g)6N such that for all n > tiq, 

M®(p)<MP(q). 

(2) Suppose a is a permutation on {1, . . . , N} such that 

A CTJV < . . . < A CT1 < oo. 

Then for any family of distributions (p(i) £ A^, i = 1,...,N) 
satisfying Pi(i) < 1 for all i, there is n$ = no(p(l), • • • ,p(N)) G N 
such that for all n > uq, 

MM(p(*i)) < Mt 2) (p(a 2 )) <...< Mt N \ P (a N )). 

(3) Suppose the state i £ 1, . . . ,N is such that Aj > A^ for all k ^ i. For 
any p G A^r and any k ^ i, if Pk < 1 then there is a time uq = n${p) 
such that for all n> uq, 

K(p)<M*(p). 

Proof: The first part follows directly from Theorem [TJ The other two 
parts are consequences of the first one. □ 
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Figure 1. Piecewise linear map generating the Markov 
chain of Example [TJ 

Example 1. Not only the size (stationary probability) of a state of the 
Markov chain matters for the escape rate through that state. Consider a 
Markov chain with transition matrix 

1/6 

5/12 
5/12 

The stationary distribution for this Markov chain is uniform, i.e., 7Tj = 1/3, 
i = 1,2,3. However, the leading eigenvalues in the reduced matrices Q^ l \ 
i = 1,2,3 are different. Namely m = 2/3, /x 2 = (7 + \/97)/24, jj, z = 
(9 + V33)/24. Therefore, the fastest escape is through the hole in the third 
state and the slowest one is through the hole in the second state. This 
example belongs to a more general class than the one considered in |4J. 

Remark 2. This Markov chain is generated, e.g., by a piecewise linear map 
/ : [0, 1] — > [0, 1] shown on Fig. [TJ States 1,2,3 correspond to intervals 
[0,1/3), [1/3,2/3), and [2/3,1], respectively, and the stationary measure is 
Lebesgue measure. 

The Markov chain in the next example is also generated by a certain ID 
expanding piecewise linear map. For the sake of brevity we do not present 
it here though. 



Example 2. It is possible that the escape is slower through a state with 



greater stationary probability (bigger "hole") 
with transition matrix 



1/12 


5/12 





1/2 


1/3 


1/3 



Consider a Markov chain 
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The stationary distribution is given by the vector (36/83, 14/83, 33/83). The 
largest eigenvalues of the matrices Q^ l \ i = 1,2,3 equal \i\ = (1 + y/7)/6, 
H 2 = (5 + v / 21)/12, // 3 = (3 + v / 15)/12. Therefore, the escape through the 
third state is faster than through the first one, although the stationary prob- 
ability ("size") of the first state is larger than that of the third state. This 
example belongs to a more general class of systems than the one considered 
in 0. 

Remark 3. A generalization of Theorem 1 to the case of a non-stochastic 
but just non-negative matrix is straightforward since the Perron-Frobenius 
theorem is still applicable. 

Remark 4. It is easy and straightforward to address the situations where 
our assumption on irreducibility of the Markov chain after the removal of a 
vertex is violated. For example, one can consider the case where besides one 
strongly connected component A satisfying our original set of assumptions 
there are several extra vertices connected to A but unreachable from A. In 
this case, the rate of escape through any vertex of A may depend on the 
initial distribution. In fact, the rate is determined by the minimum of the 
"internal" escape rate of A through that vertex and the rates of escape to A 
from the vertices outside of A that support the initial distribution. Another 
also easily and directly analyzed situation appears when after a removal of 
a state the remaining states form several isolated subsets. Clearly in this 
case escape from each of these subsets should be treated separately (and 
absolutely analogously to the proof above) . It is also easy to see that nothing 
else besides these two situations in case of reducibility of the resulting after 
removal of the state Markov chain can appear. 

3. Most efficient source 

In this section we classify the states of a Markov chain with respect to 
their efficiency in distributing the information or any perturbation over the 
entire state space. Here we assume that the Markov chain is just irreducible 
and aperiodic. 

Let the evolution be initiated at state k £ {1, . . . , N}. Then for any step 
n > 0, the distribution of the Markov chain at time n is given by e&-P n where 
efc is the k-th coordinate vector and P n is the n-step transition matrix. We 
can study the total variation distance between the distribution at time n 
and rr = (TTk)k=iJ the stationary distribution, the existence and uniqueness 
of which is guaranteed by the Perron-Frobenius Theorem: 

D k (n) = \e k P n - tt|i, k € 1, . . . , N, n > 0, 

where |t;|i = Y^i=i M * s the L 1 norm of v. Ideally, we would like to say that 
initial state k\ allows for faster convergence to the stationary distribution 
than initial state £?2 if there is no € N such that D kl (n) < D^ (n) for all 
n > riQ. However, there are situations where this property holds due to the 
specific choice of | - 1 1 to measure distances, and will be destroyed if one 
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replaces | • |i with a different (equivalent) norm. So, we choose to work with 
a partial order on states that does not depend on the concrete choice of the 
norm in R . 

We denote by J\f the set of all norms on R . We say that a sequence of 
vectors (t> n )ngN in R^ dominates another sequence of vectors (u n ) ng pj, if for 
any H £ M, there is a number uq = uq(u, v, H) such that 

H(u n ) < H(v n ), n > n . 

Obviously, (f n )ngN does not dominate (-u n ) n£ N iff there is H € N and a 
sequence (n m ) me ^ increasing to infinity such that 

H{u nm )>H{v nm ), meN. 

We shall say initial state k\ allows for faster convergence to the stationary 
distribution than initial state A)2, if (ek 1 P n — Tr) n eN is dominated by (e^P n — 

7r)nGN- 

In general, we can say that an initial distribution u allows for faster 
convergence to the stationary distribution than initial distribution v, if 
(uP n — 7r) ne pj is dominated by (vP n — 7r) nG N. This introduces a partial 
order on A^v, and our goal is to give an equivalent definition of this partial 
order in terms of projections on vectors in a (real) canonical Jordan basis 
(wi)iL 1 associated to P (we refer to [11] for the background on canonical 
forms) . 

We assume that wn = 7r, the stationary distribution for P, a positive 
eigenvector of P with simple eigenvalue 1, a unique eigenvalue of P equal 
to 1 in magnitude. To each Wi, i = 1, . . . , N— 1, we associate Aj with | Aj| < 1 
and ImAj > 0, the eigenvalue of the generalized eigenspace that Wi belongs 
to (since complex eigenvalues come in conjugate pairs, we choose Im Aj > 0). 
If Aj G R, then we define 

rj = min{r G N : Wi{P — XJY = 0}. 

Recalling that for a nonreal eigenvalue A, the canonical basis vectors are 
grouped in pairs, we can define ri = rj analogously for a pair (wi,Wj) of 
canonical basis vectors corresponding to A, R. 

In both cases, the numbers rj enumerate the generalized eigenvectors 
within one generalized eigenspace, and the pair (Aj,rj) determines the rate 
of decay of Wi under iterations of P, namely, Aj is the exponential rate of 
decay, and r^ — 1 is the degree of the polynomial factor, see Lemma Q] below. 

If /i > and k £ N, we denote by II^i; the vector projection on the vector 
subspace spanned by all Wi such that |Aj| = fi and r, = k (the projection 
is taken along the span of all other vectors of the Jordan basis). If this 
subspace is empty, the projection is assumed to be 0. 

For two vectors u, v £ R we write u < v if there is a real number a with 
\a\ < 1 such that u = av. 
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Theorem 2. Initial distribution u allows for faster convergence to the sta- 
tionary distribution than initial distribution v if and only if there are [j,o E 
(0, 1) and ro E N such that the following conditions are satisfied: 

1. If either (i) fj, E (/zo, 1), or (ii) fi = fiQ and r > ro, then H^u = 0. 

Remark 5. Intutively, it is natural to think of no as of the second largest 
eigenvalue of P. However, the theorem holds true even in such a degenerate 
situation where the projections of both u and v on the eigenspace associated 
to the second largest eigenvalue vanish. 

Corollary 2. Suppose that the image of the projection operator rLj 0jro is 1- 
dimensional (this is guaranteed if the second largest in magnitude eigenvalue 
of P is real and simple). Let us denote g, = |rLj 0)T . ej|, i = 1, . . . , N. Suppose 
a is a permutation on {1, ... , N} such that 

q ffl < ... < q UN . 

Then for any N E Af there is a number uq = uq(H) such that for any 
n > no, 

H(e ai P n - tt) < . . . < H(e aN P n - vr), 

so that for any i, j with i < j, the initial state o~i allows for faster convergence 
to the stationary distribution than the initial state Oj. In particular, the 
initial state o~\ allows for faster convergence than any other initial state. 

Remark 6. This hierarchy of states may fail to exist in the case where the 
dimension of the image of Huo,ro ^ s greater than one, e.g., where the second 
highest eigenvalue of P is non-real, or where there are multiple Jordan blocks 
associated to hq. 

Often, the best source state from the point of view of the hierarchy es- 
tablished in Corollary [2] is the state with maximal stationary probability. 
However, this is not necessarily so, as the following example shows. 

Example 3. Suppose the transition probability matrix is 

1/8 5/8 1/4 
3/8 9/16 1/16 
1/24 1/12 7/8 

Then, there are three simple eigenvalues: 1, 3/4, and —3/16. Their re- 
spective eigenvectors are: it = (1/6,1/3,1/2), w\ = (—1/6,-1/3,1/2), and 
W2 = (—16/3, 13/3, 1). Notice that the stationary probability is maximized 
by state 3 since ns > TT2 > 7Ti. However, decomposing 



11 

15 ' 

17 
— i 
15 

e 3 = 7r + 1 • w\ + • w 2 , 



ei = 7T - —wx - —w 2 , 
15 15 



eo = vr w\ Wo, 

15 15 
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comparing the projections on w\, and noticing that 11/15 < 1 < 17/15, we 
can use Theorem [2] to conclude that state 1 allows for faster convergence 
than the two other states. 

Remark 7. A generalization of Theorem 2 for non-negative (but non- 
stochastic) matrices is straightforward. 

4. Proof of Theorem [2] 

We begin with several elementary auxiliary statements. First, we recall 
formulas for powers of Jordan blocks. For a condition A, we use 

fl, if A holds, 
I 0, otherwise. 

Lemma 1. 1. Let vectors (wi r )™ =l form a generalized eigenspace of P with 
eigenvalue AgR, i.e., Wi r P = \wi r + ^2<r<mWi r _ 1 . Then 

fe=i v y 

2. Let A = jxe 1 ^ , where \x > and 4> £ (0, it), and vectors (wi r )™ =1 , (wj r )™ =1 
form a generalized eigenspace of P with eigenvalue X, i.e., for any a,b £l, 

(aw ir + bw jr )P = fj,(acos(j)- bs'm<f))wi r + al2<r<m^ r _i 
+ fi(a sin 4> + b cos cf))wj k + 6l2< r <m^>-i- 
Then 

{aw ir +bw jr )P n 

= J2( \) /i n " (r " fc) [(a cos((n - (r - k))<f>) - 6 sin((n - (r - k))<f>))w. 
fe=l v _ ^ 

+ (asin((n — (r — &))</>) + 6cos((n — (r — k))<fi))wj k 

Lemma 2. Lei u, v 6 R^. If u < v, then H{u) < H{v) for any H € A/". // 
u = v, then H(u) = H(v) for any H G A/". If u ^ v and u -ft v, then there 
is H G A" smc/i t/iat -ff(u) > Lf(u). 

Proof: First two statements of the lemma are trivial. It is sufficient to 
prove the third one for the case where u and v are not proportional to 
each other. To that end, let us take a linear bijection that sends vectors 
(2,0, 0, . . . , 0) and (0, 1,0, 0, . . . ,0) to u and v. The pushforward of the 
Euclidean norm under this map satisfies the desired property. □ 

Lemma 3. Suppose x, y 6 M. N and they are not multiples of each other. 
Then there is H £ N , a neighborhood U of x, and a constant c > such 
that for all z 6 U , ^H(z + ey)\ is well defined and exceeds c. 



'k 
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Proof: Let us take a linear bijection that sends vectors (1, 0, 0, . . . , 0) and 
(l,l,0,0,0,...,0)tox and y. The pushforward of the Euclidean norm under 
this map satisfies the desired property. □ 

PROOF of Theorem [2} First, we notice that ITi^u = Hi^v = it = wn- 
This follows from the following facts: P-iterates of u and v converge to tt; 
ILnu and n^if are invariant under P; P-iterates of all other projections 
decay exponentially. 

Suppose that there are jxq and tq such that conditions 1 and 2 of the 
Theorem hold true. Decomposing u and v w.r.t. the canonical basis and 
using Lemma [IJ we immediately see that u allows for faster convergence 
than v. 

Suppose now that u allows for faster convergence than v. Let us choose 
Ho and ro so that LT MOjr . u ^ and if either (i) fj, G (//o, 1), or (ii) \x = \x§ and 
r > ro, then IL^rV = 0. 

Lemma Q] immediately implies now that condition 1 of the theorem is 
satisfied. To prove condition 2, let us assume that the opposite holds, i.e., 
rLj ,n) u <£ Ii^ 0tro v. First, we consider the case where ILj 0jro u / 11^^?;. 
Lemma[2] allows us to find a norm H E M such that H(ILu 0tJ . u) > iJ(riu 0)ro t;). 
For any small neighborhoods U of IL_t 0iro it and V of H^ ^ o v we can use 
Lemma [1] to find an infinite sequence n m — > oo such that 

TT , , pn-m 

\ro-VrO 

(3) /r\ fJ n m gV; mGN. 

This is trivially true with n m = m if all the eigenvalues with magnitude no 

are real and equal to /^o- If the arguments of some of these eigenvalues 

are not zero, then we can use the recurrence property of the shift on the 

multidimensional torus induced by these arguments. 

Using Lemma [T] to compute the leading terms of (u — Tr)P Um and (v — 
7r)P n ™ we see t h at 



( 4 ) J™." (n m ,._:i'° =0, Z = U,V. 



1 -J Mo 



n„ 
ro- 



Therefore, 

(n-vr)P nm 

( 5 ) Ww Gf/ ' mGN ' 

(6) W^ eF - mGN - 

It-o-1^0 

so that choosing U and V disjoint and sufficiently small and using inequal- 
ity .ff(n A j 0)r . u) > H(U^ 0tro v) along with the continuity of H, we conclude 
that u does not allow for faster convergence than v. This contradicts our 
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assumption and therefore it remains to consider 

(7) rWo^ = U M,r V- 

Faster convergence for u is clearly impossible in the situation where u = v. 
Assuming u 7^ v, we can find numbers [X\ and T\ such that IL^ ri u 7^ IL^ ri u 
and if (i) \x\ < fi < fio, or (ii) \x = \X\ and r > ri, then n Mr u = IL^j-u. 

Let £7,P, and c be defined in Lemma [3] applied to x = n w ^ M = TL^^v, 
and y = IL^^u — ILj ljn u which is not a multiple of x. Due to Lemma [IJ 
there is a sequence of numbers n m — > 00 and a sequence of vectors u m ,v m 
such that 

( 8 ) tto; =^ + ^ ™ GN > 

(t;-7r)P nm 

( 9 ) /n m „n m =Zm+V m , W, G N, 

where z m £ U for all m, and 

( 10 ) U ™ = /n m N n m + , n . , m -> OO, 

( U ) V - = /n m Xn m + ° /n m Nn m . m ^ OO, 

We can use relations (fTUI) and (fTTI) to derive 



H(z m +U m )-H(z m + V m ) = /„~\ n m ■-j-H(z m +e0 1 ^ 1 ,r 1 U-'n. IA1>ri v)) \ £=Q 

(( n m \ ,,n m \ 

Since z m G f7, Lemma [3] allows us to conclude that the derivative in the 
r.h.s. of the last identity exceeds c > 0. Therefore, relations (|8j) and ([9]) 
imply that H{(u — 7r)P nm ) > H((y - ■n)P n " 1 ) for all m, which contradicts 
our assumption that u allows for faster convergence than v. Therefore, ([7]) 
is impossible and the proof of the necessity of conditions 1 and 2 of the 
theorem is complete. □ 



5. Concluding remarks 

We have shown that the tails of the distributions of hitting times for dif- 
ferent states of irreducible Markov chains and for typical initial distributions 
can be ordered. This means that there is a finite moment of time n* after 
which the tails of these distributions never intersect. This property allows 
to determine the optimal sink in a Markov chain or in a dynamical network. 
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Our results hold for any nondegenerate initial distribution and in this 
respect they essentially generalize those in [2], where only Lebesgue measure 
was considered. 

We also demonstrated that one can determine the best source in a Markov 
chain. Again it is a finite time result and the hierarchy of the Markov 
chain states emerges in their ability to serve as a source. For a network, 
this suggests the node or element one should apply a perturbation to, or 
inject information at, so that the perturbation spreads over the network 
and converges to the stationary distribution in the fastest way. Our results 
are also true (with obvious adjustments) if the matrix P is nonnegative and 
not necessarily stochastic. 
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